Report for Data Mining Cup 2002 by :

نویسنده

  • Zengchang Qin
چکیده

This research report is written for attendance of DMC 2002. (See [8]) It was written following the CRISP-DM (CRISPData Mining) [1] Methodology: Business understanding, Data Understanding, Data Preparation, Modeling and Evaluation. Two popular data mining software products are used: DISCOVERER is mainly used for data preparation and modeling. WEKA is used to feature selection. At the last part of the report, some personal intuitive understanding of real-world data mining is also given.

منابع مشابه

Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup

MOTIVATION The biological literature is a major repository of knowledge. Many biological databases draw much of their content from a careful curation of this literature. However, as the volume of literature increases, the burden of curation increases. Text mining may provide useful tools to assist in the curation process. To date, the lack of standards has made it impossible to determine whethe...

متن کامل

Predicting customer behaviour: The University of Melbourne's KDD Cup report

We discuss the challenges of the 2009 KDD Cup along with our ideas and methodologies for modelling the problem. The main stages included aggressive nonparametric feature selection, careful treatment of categorical variables and tuning a gradient boosting machine under Bernoulli loss with trees.

متن کامل

Rule - based Extraction of Experimental Evidence in the Biomedical Domain – the KDD Cup 2002 ( Task 1 )

Below we describe the winning system that we built for the KDD Cup 2002 Task 1 competition. Our system is a Rule-based Information Extraction (IE) system. It combines pattern matching, Natural Language Processing (NLP) tools, semantic constraints based on the domain and the specific task, and a post-processing stage for making the final curation decision based on the various evidence (positive ...

متن کامل

Using Data and Text Mining Techniques for Yeast Gene Regulation Prediction: A Case Study

We focus on the problem of predicting yeast gene regulation experiments. In order to construct a good solution, we study combinations of different methods that are not yet to be found in any single data mining application. We describe our approach to propositionalizing the given relational data that describes the interaction among proteins. We study how we can exploit a large archive of scienti...

متن کامل

Bennett Netflix 100 Winchester Circle

INTRODUCTION The KDD Cup is the oldest of the many data mining competitions that are now popular [1]. It is an integral part of the annual ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). In 2007, the traditional KDD Cup competition was augmented with a workshop with a focus on the concurrently active Netflix Prize competition [2]. The KDD Cup itself in 2007 con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003